Team Braavos - Predicting and Visualizing College Admissions

Overview and Motivation

Every year, four million students apply to US Colleges without having a good idea of their chances of getting in. Fueled by US News Rankings, colleges puff up their rejection rates, while myopically finding students through the narrow lens of standardized test scores. This project provides data-science based probabilities of getting in along with interactive visualizations, allowing students and parents to investigate how certain aspects of their application affect the chances of an acceptance to various schools and therefore allowing them to best focus their time and money. Users can also view summaries of application and acceptance data to make other inferences about their own potential success in applying to different colleges. We then aim to disrupt the entire application process so quality students can be matched to the right school and schools can fulfill their desired mix of students..

Team and Roles

Team Braavos consists of:

  • Marina Adario
  • Dion Hagan
  • Malcolm Mason Rodriguez
  • David Wihl

We anticipate being a fully agile team without pre-defined roles. Anyone can submit, check-in or work on any story.

The team maintains a Kanban board via Trello. The "To Do" column is a regularly triaged and sorted list of the next task. Once an individual completes a task, he or she grabs the next item off the top of the list.

Communication Rules:

We collaborate regularly during the week via Slack. Email usage is minimal as necessary. Electonic signatures will suffice when signatures are required for submission.

Physical meetings occur once per week on Wednesday either during or just after Studio as necessary.

Collaboration Policy

The entire project is version controlled through github. Any non-trivial story will have a separate branch, with regular check-ins and merges. We are attempting to be fully Agile so any member can work on any story. Each story should be short, no longer than two days. Stories that are blocked will be indicated as such in Trello via a red bar label.

Elevator Pitch

“Every year, four million students apply to US Colleges without having a good idea of their chances of getting in. Fueled by US News Rankings, colleges puff up their rejection rates, while myopically finding students through the narrow lens of standardized test scores. ChanceMe provides data-science based probabilities of getting in along with interactive visualizations, allowing students and parents to investigate how certain aspects of their application affect the chances of an acceptance to various schools and therefore allowing them to best focus their time and money. Users can also view summaries of application and acceptance data to make other inferences about their own potential success in applying to different colleges. We then aim to disrupt the entire application process so quality students can be matched to the right school and schools can fulfill their desired mix of students.”

Feature List

The visualization will consist of at least two pages.

Page 1 - See Your Chances

This will be the home page of the site. There will be two areas for data entry:

  • Area 1 - demographic data that cannot be changed, including gender, US citizenship, first to attend college, etc.
  • Area 2 - college admission factors that may vary prior to college application such as GPA, standardized test scores, number of AP exams taken.

As the applicant changes values in area 2, the visualization will update. The intent is this will be highly interactive, almost game-like, encourgaging the applicant to attempt many different scenarios.

Page 2 - Data Drill Down

The second page will have several linked visualizations that allow the user to drill down into the specific factors that weigh into the college acceptance process. This allows the applicant to compare and contrast different schools as well as plan time for the most appropriate activities to maximum college acceptance.

Summary of Features

  • Demographic data entry
  • Admission factors data entry
  • School selection
  • Acceptance Probability Visualization
  • Drill down visualizations
  • College comparison visualizations
  • School selection for drill down visualizations

Project Storyboard

In [1]:
from IPython.display import Image
Image(filename='img/Final_project_story_board.png')
Out[1]:

Tasks and Timeline

(This is for a general idea only. The reference tasks and timeline are in Trello.)

Target:
  • ✔ Choose domain (completed 3/28)
  • ✔ Define question (completed 3/28)
  • ✔ Explore existing solutions (completed 3/28)
  • Formulate data analysis tasks (no longer needed)
Data Wrangling:
  • ✔ Find and clean data (completed 3/28)
  • ✔ Exploratory Data Analysis (completed 3/28)
  • ✔ Transform and summarize data (completed 3/28)
Design
  • Design Visual Encoding
  • ✔ Design Interaction - Prediction (completed 4/4)
  • ✔ Design layout and storytelling - Prediction (completed 4/4)
  • Design Interaction - Drill down (postponed due to redesign. Now late at 4/18)
  • ✔ Perform 'paper' user testing (completed in studio 4/6)
Implement
  • ✔ Rapid prototype - Drill down (initial prototype complete 4/11)
  • ✔ Design system architecture (completed 4/11)
  • ✔ Rapid prototype - Prediction (due 4/18)
  • Innovative visualization: Distorted map of acceptance distance (due 4/25)
  • Linked visualizations (due 4/25)
  • Define Data Structures (no longer needed)
Evaluate
  • Perform user testing with prototype (4/20)
  • Is the abstraction right? (due 4/20)
  • Does encoding and interaction support the task? (due 4/21)
  • Does encoding and interaction provide new insights? (due 4/21)
Deliverables
  • Process Book (due 5/2)
  • Screencast (due 5/2)
  • Demos / design fair (due 5/4)
Time Permitting / Nice to Have
  • Performance optimizations:
    • Determine bottlenecks and explore efficient algorithms
    • Random Forest in JavaScript (current webservice is very slow)
    • ✔ cache the CSV (collegelist and the data) in localStorage (completed 4/4)
    • train the Random Forest asynchonously
    • populate list of colleges from CSV (or cache) instead of hard coded
In [ ]: